87 research outputs found
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Recent advances in 3D content creation mostly leverage optimization-based 3D
generation via score distillation sampling (SDS). Though promising results have
been exhibited, these methods often suffer from slow per-sample optimization,
limiting their practical usage. In this paper, we propose DreamGaussian, a
novel 3D content generation framework that achieves both efficiency and quality
simultaneously. Our key insight is to design a generative 3D Gaussian Splatting
model with companioned mesh extraction and texture refinement in UV space. In
contrast to the occupancy pruning used in Neural Radiance Fields, we
demonstrate that the progressive densification of 3D Gaussians converges
significantly faster for 3D generative tasks. To further enhance the texture
quality and facilitate downstream applications, we introduce an efficient
algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning
stage to refine the details. Extensive experiments demonstrate the superior
efficiency and competitive generation quality of our proposed approach.
Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes
from a single-view image, achieving approximately 10 times acceleration
compared to existing methods.Comment: project page: https://dreamgaussian.github.io
Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement
Neural Radiance Fields (NeRF) have constituted a remarkable breakthrough in
image-based 3D reconstruction. However, their implicit volumetric
representations differ significantly from the widely-adopted polygonal meshes
and lack support from common 3D software and hardware, making their rendering
and manipulation inefficient. To overcome this limitation, we present a novel
framework that generates textured surface meshes from images. Our approach
begins by efficiently initializing the geometry and view-dependency decomposed
appearance with a NeRF. Subsequently, a coarse mesh is extracted, and an
iterative surface refining algorithm is developed to adaptively adjust both
vertex positions and face density based on re-projected rendering errors. We
jointly refine the appearance with geometry and bake it into texture images for
real-time rendering. Extensive experiments demonstrate that our method achieves
superior mesh quality and competitive rendering quality.Comment: ICCV 2023 camera-ready, Project Page: https://me.kiui.moe/nerf2mes
DFT-Spread Spectrally Overlapped Hybrid OFDM-Digital Filter Multiple Access IMDD PONs
A novel transmission technique—namely, a DFT-spread spectrally overlapped hybrid OFDM–digital filter multiple access (DFMA) PON based on intensity modulation and direct detection (IMDD)—is here proposed by employing the discrete Fourier transform (DFT)-spread technique in each optical network unit (ONU) and the optical line terminal (OLT). Detailed numerical simulations are carried out to identify optimal ONU transceiver parameters and explore their maximum achievable upstream transmission performances on the IMDD PON systems. The results show that the DFT-spread technique in the proposed PON is effective in enhancing the upstream transmission performance to its maximum potential, whilst still maintaining all of the salient features associated with previously reported PONs. Compared with previously reported PONs excluding DFT-spread, a significant peak-to-average power ratio (PAPR) reduction of over 2 dB is achieved, leading to a 1 dB reduction in the optimal signal clipping ratio (CR). As a direct consequence of the PAPR reduction, the proposed PON has excellent tolerance to reduced digital-to-analogue converter/analogue-to-digital converter (DAC/ADC) bit resolution, and can therefore ensure the utilization of a minimum DAC/ADC resolution of only 6 bits at the forward error correction (FEC) limit (1 × 10−3). In addition, the proposed PON can improve the upstream power budget by >1.4 dB and increase the aggregate upstream signal transmission rate by up to 10% without degrading nonlinearity tolerances
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans
Despite recent research advancements in reconstructing clothed humans from a
single image, accurately restoring the "unseen regions" with high-level details
remains an unsolved challenge that lacks attention. Existing methods often
generate overly smooth back-side surfaces with a blurry texture. But how to
effectively capture all visual attributes of an individual from a single image,
which are sufficient to reconstruct unseen areas (e.g., the back view)?
Motivated by the power of foundation models, TeCH reconstructs the 3D human by
leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles)
which are automatically generated via a garment parsing model and Visual
Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion
model (T2I) which learns the "indescribable" appearance. To represent
high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D
representation based on DMTet, which consists of an explicit body shape grid
and an implicit distance field. Guided by the descriptive prompts +
personalized T2I diffusion model, the geometry and texture of the 3D humans are
optimized through multi-view Score Distillation Sampling (SDS) and
reconstruction losses based on the original observation. TeCH produces
high-fidelity 3D clothed humans with consistent & delicate texture, and
detailed full-body geometry. Quantitative and qualitative experiments
demonstrate that TeCH outperforms the state-of-the-art methods in terms of
reconstruction accuracy and rendering quality. The code will be publicly
available for research purposes at https://huangyangyi.github.io/TeCHComment: Project: https://huangyangyi.github.io/TeCH, Code:
https://github.com/huangyangyi/TeC
Point Scene Understanding via Disentangled Instance Mesh Reconstruction
Semantic scene reconstruction from point cloud is an essential and
challenging task for 3D scene understanding. This task requires not only to
recognize each instance in the scene, but also to recover their geometries
based on the partial observed point cloud. Existing methods usually attempt to
directly predict occupancy values of the complete object based on incomplete
point cloud proposals from a detection-based backbone. However, this framework
always fails to reconstruct high fidelity mesh due to the obstruction of
various detected false positive object proposals and the ambiguity of
incomplete point observations for learning occupancy values of complete
objects. To circumvent the hurdle, we propose a Disentangled Instance Mesh
Reconstruction (DIMR) framework for effective point scene understanding. A
segmentation-based backbone is applied to reduce false positive object
proposals, which further benefits our exploration on the relationship between
recognition and reconstruction. Based on the accurate proposals, we leverage a
mesh-aware latent code space to disentangle the processes of shape completion
and mesh generation, relieving the ambiguity caused by the incomplete point
observations. Furthermore, with access to the CAD model pool at test time, our
model can also be used to improve the reconstruction quality by performing mesh
retrieval without extra training. We thoroughly evaluate the reconstructed mesh
quality with multiple metrics, and demonstrate the superiority of our method on
the challenging ScanNet dataset
Efficient Text-Guided 3D-Aware Portrait Generation with Score Distillation Sampling on Distribution
Text-to-3D is an emerging task that allows users to create 3D content with
infinite possibilities. Existing works tackle the problem by optimizing a 3D
representation with guidance from pre-trained diffusion models. An apparent
drawback is that they need to optimize from scratch for each prompt, which is
computationally expensive and often yields poor visual fidelity. In this paper,
we propose DreamPortrait, which aims to generate text-guided 3D-aware portraits
in a single-forward pass for efficiency. To achieve this, we extend Score
Distillation Sampling from datapoint to distribution formulation, which injects
semantic prior into a 3D distribution. However, the direct extension will lead
to the mode collapse problem since the objective only pursues semantic
alignment. Hence, we propose to optimize a distribution with hierarchical
condition adapters and GAN loss regularization. For better 3D modeling, we
further design a 3D-aware gated cross-attention mechanism to explicitly let the
model perceive the correspondence between the text and the 3D-aware space.
These elaborated designs enable our model to generate portraits with robust
multi-view semantic consistency, eliminating the need for optimization-based
methods. Extensive experiments demonstrate our model's highly competitive
performance and significant speed boost against existing methods
- …